Overview

Dataset statistics

Number of variables9
Number of observations3000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory234.4 KiB
Average record size in memory80.0 B

Variable types

Numeric9

Alerts

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitude and 1 other fieldsHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with latitude and 1 other fieldsHigh correlation

Reproduction

Analysis started2022-10-11 10:01:19.749428
Analysis finished2022-10-11 10:01:38.503267
Duration18.75 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct607
Distinct (%)20.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5892
Minimum-124.18
Maximum-114.49
Zeros0
Zeros (%)0.0%
Negative3000
Negative (%)100.0%
Memory size46.9 KiB
2022-10-11T15:31:38.636519image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-124.18
5-th percentile-122.47
Q1-121.81
median-118.485
Q3-118.02
95-th percentile-117.1
Maximum-114.49
Range9.69
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation1.994936294
Coefficient of variation (CV)-0.01668157571
Kurtosis-1.36277166
Mean-119.5892
Median Absolute Deviation (MAD)1.275
Skewness-0.2978576326
Sum-358767.6
Variance3.979770817
MonotonicityNot monotonic
2022-10-11T15:31:38.886167image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.2626
 
0.9%
-118.2126
 
0.9%
-118.2825
 
0.8%
-118.2725
 
0.8%
-118.2925
 
0.8%
-118.324
 
0.8%
-118.1423
 
0.8%
-118.3522
 
0.7%
-118.3121
 
0.7%
-118.0221
 
0.7%
Other values (597)2762
92.1%
ValueCountFrequency (%)
-124.181
 
< 0.1%
-124.171
 
< 0.1%
-124.164
0.1%
-124.151
 
< 0.1%
-124.143
0.1%
-124.11
 
< 0.1%
-124.092
0.1%
-124.011
 
< 0.1%
-123.921
 
< 0.1%
-123.851
 
< 0.1%
ValueCountFrequency (%)
-114.491
 
< 0.1%
-114.551
 
< 0.1%
-114.611
 
< 0.1%
-114.621
 
< 0.1%
-114.981
 
< 0.1%
-115.491
 
< 0.1%
-115.521
 
< 0.1%
-115.561
 
< 0.1%
-115.574
0.1%
-115.591
 
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct587
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63539
Minimum32.56
Maximum41.92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:39.152840image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum32.56
5-th percentile32.82
Q133.93
median34.27
Q337.69
95-th percentile38.97
Maximum41.92
Range9.36
Interquartile range (IQR)3.76

Descriptive statistics

Standard deviation2.129669523
Coefficient of variation (CV)0.05976276739
Kurtosis-1.12437247
Mean35.63539
Median Absolute Deviation (MAD)1.25
Skewness0.4598159368
Sum106906.17
Variance4.535492279
MonotonicityNot monotonic
2022-10-11T15:31:39.402836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.0235
 
1.2%
34.0633
 
1.1%
34.0532
 
1.1%
34.0931
 
1.0%
34.1131
 
1.0%
34.0731
 
1.0%
33.9330
 
1.0%
33.9130
 
1.0%
33.8428
 
0.9%
33.9727
 
0.9%
Other values (577)2692
89.7%
ValueCountFrequency (%)
32.561
 
< 0.1%
32.573
0.1%
32.586
0.2%
32.592
 
0.1%
32.61
 
< 0.1%
32.614
0.1%
32.622
 
0.1%
32.642
 
0.1%
32.663
0.1%
32.671
 
< 0.1%
ValueCountFrequency (%)
41.921
< 0.1%
41.81
< 0.1%
41.631
< 0.1%
41.541
< 0.1%
41.311
< 0.1%
41.281
< 0.1%
41.231
< 0.1%
41.21
< 0.1%
41.011
< 0.1%
40.991
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.84533333
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:39.686479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.55539555
Coefficient of variation (CV)0.4352660935
Kurtosis-0.8037837284
Mean28.84533333
Median Absolute Deviation (MAD)10
Skewness0.01851312116
Sum86536
Variance157.6379575
MonotonicityNot monotonic
2022-10-11T15:31:39.953160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
52173
 
5.8%
35118
 
3.9%
36115
 
3.8%
16107
 
3.6%
34102
 
3.4%
17100
 
3.3%
3291
 
3.0%
2688
 
2.9%
3788
 
2.9%
2586
 
2.9%
Other values (42)1932
64.4%
ValueCountFrequency (%)
12
 
0.1%
26
 
0.2%
312
 
0.4%
428
0.9%
539
1.3%
625
0.8%
720
0.7%
825
0.8%
927
0.9%
1030
1.0%
ValueCountFrequency (%)
52173
5.8%
5111
 
0.4%
5016
 
0.5%
4921
 
0.7%
4834
 
1.1%
4722
 
0.7%
4641
 
1.4%
4551
 
1.7%
4451
 
1.7%
4356
 
1.9%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2215
Distinct (%)73.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2599.578667
Minimum6
Maximum30450
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:40.219598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum6
5-th percentile585.9
Q11401
median2106
Q33129
95-th percentile6016.45
Maximum30450
Range30444
Interquartile range (IQR)1728

Descriptive statistics

Standard deviation2155.593332
Coefficient of variation (CV)0.8292087327
Kurtosis32.09994094
Mean2599.578667
Median Absolute Deviation (MAD)815.5
Skewness4.167637359
Sum7798736
Variance4646582.611
MonotonicityNot monotonic
2022-10-11T15:31:40.469810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17785
 
0.2%
19665
 
0.2%
9075
 
0.2%
17875
 
0.2%
21274
 
0.1%
15644
 
0.1%
10054
 
0.1%
14994
 
0.1%
15314
 
0.1%
29144
 
0.1%
Other values (2205)2956
98.5%
ValueCountFrequency (%)
61
< 0.1%
161
< 0.1%
181
< 0.1%
191
< 0.1%
211
< 0.1%
251
< 0.1%
322
0.1%
381
< 0.1%
401
< 0.1%
411
< 0.1%
ValueCountFrequency (%)
304501
< 0.1%
278701
< 0.1%
241211
< 0.1%
239151
< 0.1%
219881
< 0.1%
203541
< 0.1%
181321
< 0.1%
181231
< 0.1%
174701
< 0.1%
165901
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1055
Distinct (%)35.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean529.9506667
Minimum2
Maximum5419
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:40.736535image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile130.95
Q1291
median437
Q3636
95-th percentile1220.1
Maximum5419
Range5417
Interquartile range (IQR)345

Descriptive statistics

Standard deviation415.6543681
Coefficient of variation (CV)0.7843265313
Kurtosis28.53707082
Mean529.9506667
Median Absolute Deviation (MAD)165
Skewness3.863393189
Sum1589852
Variance172768.5538
MonotonicityNot monotonic
2022-10-11T15:31:41.003089image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
31415
 
0.5%
27012
 
0.4%
29911
 
0.4%
27410
 
0.3%
29810
 
0.3%
34810
 
0.3%
30110
 
0.3%
52810
 
0.3%
49310
 
0.3%
29210
 
0.3%
Other values (1045)2892
96.4%
ValueCountFrequency (%)
21
 
< 0.1%
31
 
< 0.1%
41
 
< 0.1%
51
 
< 0.1%
72
 
0.1%
85
0.2%
111
 
< 0.1%
121
 
< 0.1%
131
 
< 0.1%
143
0.1%
ValueCountFrequency (%)
54191
< 0.1%
50331
< 0.1%
50271
< 0.1%
45851
< 0.1%
45221
< 0.1%
41351
< 0.1%
40551
< 0.1%
34931
< 0.1%
31731
< 0.1%
29711
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1802
Distinct (%)60.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1402.798667
Minimum5
Maximum11935
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:41.252893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile346.95
Q1780
median1155
Q31742.75
95-th percentile3238.3
Maximum11935
Range11930
Interquartile range (IQR)962.75

Descriptive statistics

Standard deviation1030.543012
Coefficient of variation (CV)0.7346335842
Kurtosis16.44326818
Mean1402.798667
Median Absolute Deviation (MAD)450
Skewness2.949670691
Sum4208396
Variance1062018.9
MonotonicityNot monotonic
2022-10-11T15:31:41.719463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8707
 
0.2%
7536
 
0.2%
6976
 
0.2%
8816
 
0.2%
12116
 
0.2%
15815
 
0.2%
5685
 
0.2%
7695
 
0.2%
4945
 
0.2%
12775
 
0.2%
Other values (1792)2944
98.1%
ValueCountFrequency (%)
51
 
< 0.1%
82
0.1%
142
0.1%
191
 
< 0.1%
211
 
< 0.1%
221
 
< 0.1%
251
 
< 0.1%
261
 
< 0.1%
273
0.1%
291
 
< 0.1%
ValueCountFrequency (%)
119351
< 0.1%
111391
< 0.1%
108771
< 0.1%
94191
< 0.1%
88241
< 0.1%
87681
< 0.1%
81521
< 0.1%
76041
< 0.1%
75961
< 0.1%
75601
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1026
Distinct (%)34.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean489.912
Minimum2
Maximum4930
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:41.986410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile122.95
Q1273
median409.5
Q3597.25
95-th percentile1113
Maximum4930
Range4928
Interquartile range (IQR)324.25

Descriptive statistics

Standard deviation365.4227098
Coefficient of variation (CV)0.7458945888
Kurtosis26.22936135
Mean489.912
Median Absolute Deviation (MAD)153.5
Skewness3.559753412
Sum1469736
Variance133533.7568
MonotonicityNot monotonic
2022-10-11T15:31:42.253180image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37512
 
0.4%
27312
 
0.4%
61412
 
0.4%
34011
 
0.4%
36311
 
0.4%
42911
 
0.4%
45611
 
0.4%
33511
 
0.4%
23911
 
0.4%
28711
 
0.4%
Other values (1016)2887
96.2%
ValueCountFrequency (%)
21
 
< 0.1%
32
 
0.1%
72
 
0.1%
82
 
0.1%
95
0.2%
101
 
< 0.1%
111
 
< 0.1%
121
 
< 0.1%
131
 
< 0.1%
143
0.1%
ValueCountFrequency (%)
49301
< 0.1%
48551
< 0.1%
41761
< 0.1%
39581
< 0.1%
32931
< 0.1%
32521
< 0.1%
31971
< 0.1%
29641
< 0.1%
26511
< 0.1%
23921
< 0.1%

median_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2578
Distinct (%)85.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8072718
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:42.503075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.56239
Q12.544
median3.48715
Q34.656475
95-th percentile6.97549
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.112475

Descriptive statistics

Standard deviation1.85451173
Coefficient of variation (CV)0.4870972778
Kurtosis5.626184149
Mean3.8072718
Median Absolute Deviation (MAD)1.02845
Skewness1.698511735
Sum11421.8154
Variance3.439213756
MonotonicityNot monotonic
2022-10-11T15:31:42.752743image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15.00019
 
0.3%
48
 
0.3%
3.3758
 
0.3%
2.1257
 
0.2%
3.8757
 
0.2%
3.257
 
0.2%
2.757
 
0.2%
2.3756
 
0.2%
3.68756
 
0.2%
3.6256
 
0.2%
Other values (2568)2929
97.6%
ValueCountFrequency (%)
0.49991
 
< 0.1%
0.5363
0.1%
0.54951
 
< 0.1%
0.70541
 
< 0.1%
0.74031
 
< 0.1%
0.751
 
< 0.1%
0.80541
 
< 0.1%
0.81851
 
< 0.1%
0.82521
 
< 0.1%
0.8441
 
< 0.1%
ValueCountFrequency (%)
15.00019
0.3%
14.28671
 
< 0.1%
13.66231
 
< 0.1%
12.87631
 
< 0.1%
12.64171
 
< 0.1%
12.37671
 
< 0.1%
11.8061
 
< 0.1%
11.77941
 
< 0.1%
11.57061
 
< 0.1%
11.19781
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1784
Distinct (%)59.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean205846.275
Minimum22500
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size46.9 KiB
2022-10-11T15:31:43.003008image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum22500
5-th percentile67785
Q1121200
median177650
Q3263975
95-th percentile465640
Maximum500001
Range477501
Interquartile range (IQR)142775

Descriptive statistics

Standard deviation113119.6875
Coefficient of variation (CV)0.5495347801
Kurtosis0.3953989964
Mean205846.275
Median Absolute Deviation (MAD)68000
Skewness0.9895619132
Sum617538825
Variance1.279606369 × 1010
MonotonicityNot monotonic
2022-10-11T15:31:43.319973image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001125
 
4.2%
13750023
 
0.8%
16250021
 
0.7%
22500017
 
0.6%
35000014
 
0.5%
8750013
 
0.4%
10000013
 
0.4%
18750013
 
0.4%
11250012
 
0.4%
15000011
 
0.4%
Other values (1774)2738
91.3%
ValueCountFrequency (%)
225001
< 0.1%
375001
< 0.1%
392001
< 0.1%
398001
< 0.1%
400001
< 0.1%
415001
< 0.1%
425001
< 0.1%
427001
< 0.1%
431001
< 0.1%
433001
< 0.1%
ValueCountFrequency (%)
500001125
4.2%
5000004
 
0.1%
4958001
 
< 0.1%
4955001
 
< 0.1%
4947001
 
< 0.1%
4932001
 
< 0.1%
4923001
 
< 0.1%
4920001
 
< 0.1%
4898001
 
< 0.1%
4871001
 
< 0.1%

Interactions

2022-10-11T15:31:35.886414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:20.853821image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.387280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.985884image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:25.736007image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:27.753241image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:29.529292image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:31.486214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:33.720023image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:36.086490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.020787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.535143image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:24.157683image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:25.923463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:27.940701image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:29.720480image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:31.970321image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:33.959297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:36.286502image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.187101image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.753970image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:24.329516image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:26.126541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:28.128154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:29.907937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:32.286843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:34.169971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:36.503024image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.350789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.941430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:24.537712image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:26.323547image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:28.357411image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:30.079773image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:32.486758image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:34.386549image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:36.719774image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.491382image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.113260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:24.740790image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:26.563887image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:28.547364image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:30.336985image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:32.703710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:34.603106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:36.936419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.670522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.282885image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:24.912661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:26.769096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:28.750476image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:30.558856image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:32.903167image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:34.803258image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:37.153595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:21.837106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.454720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:25.146945image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:26.987795image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:28.951297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:30.777563image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:33.119999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:35.036753image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:37.353346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.020498image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.626555image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:25.329854image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:27.175251image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:29.154375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:30.964944image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:33.303287image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:35.236406image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:37.585636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:22.204018image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:23.814011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:25.532928image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:27.378328image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:29.341830image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:31.183643image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:33.520162image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-10-11T15:31:35.470078image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-10-11T15:31:43.552748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-11T15:31:43.903022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-11T15:31:44.252710image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-11T15:31:44.603054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-11T15:31:37.969513image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-11T15:31:38.353270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-122.0537.3727.03885.0661.01537.0606.06.6085344700.0
1-118.3034.2643.01510.0310.0809.0277.03.5990176500.0
2-117.8133.7827.03589.0507.01484.0495.05.7934270500.0
3-118.3633.8228.067.015.049.011.06.1359330000.0
4-119.6736.3319.01241.0244.0850.0237.02.937581700.0
5-119.5636.5137.01018.0213.0663.0204.01.663567000.0
6-121.4338.6343.01009.0225.0604.0218.01.664167000.0
7-120.6535.4819.02310.0471.01341.0441.03.2250166900.0
8-122.8438.4015.03080.0617.01446.0599.03.6696194400.0
9-118.0234.0831.02402.0632.02830.0603.02.3333164200.0

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
2990-118.2334.0949.01638.0456.01500.0430.02.6923150000.0
2991-117.1734.2813.04867.0718.0780.0250.07.1997253800.0
2992-122.3337.3952.0573.0102.0232.092.06.2263500001.0
2993-117.9133.6037.02088.0510.0673.0390.05.1048500001.0
2994-117.9333.8635.0931.0181.0516.0174.05.5867182500.0
2995-119.8634.4223.01450.0642.01258.0607.01.1790225000.0
2996-118.1434.0627.05257.01082.03496.01036.03.3906237200.0
2997-119.7036.3010.0956.0201.0693.0220.02.289562000.0
2998-117.1234.1040.096.014.046.014.03.2708162500.0
2999-119.6334.4242.01765.0263.0753.0260.08.5608500001.0